DOCUMENT RESUME 



ED 393 92' 



TM 024 937 



AUTHOR 

TITLE 



PUB DATE 
NOTE 



PUB TYPE 



Donahue, Lisa M, ; And Others 

An Analysis of the Effects of Untranslated Behavioral 
Checklists on the Psychometric Properties of 
Assessment Centers . 

Jun 95 

41p.; Paper presented at the Annual Meeting of the 
International Personnel Management Association 
Assessment Council Conference (New Orleans, LA, June 
25“29, 1995). 

Reports Evaluative/Feasibility (142) — 
Speeches/Conference Papers (150) 



EDRS PRICE 
DESCRIPTORS 



MF01/FC02 Plus Postage. 



^Assessment Centers (Personnel) ; ^Behavior Rating 
Scales; *Check Lists; *Eval nation Methods; Field 
Tests; Occupational Tests; Personnel Evaluation; 



Personnel Selection; *Pol i ce ; Psychometrics ; Scores ; 
S imula t i on ; *Val i di ty 



ABSTRACT 



A field study with 178 candidates for a police 



promotional examination was conducted to investigate the effects of 
"untranslated" behavioral checklists on certain psychometric 
properties of an assessment center. The untranslated checklist used 
all behavioral responses elicited by the assessment center exercises, 
not just those that met a retranslation criterion of categorizing 
into dimensions. The study examined whether similar convergence of 
dimensions across exercises could be obtained across four job 
simulation exercises that varied greatly in content. The reliability 
of the behavioral checklist and cri terion-related validity were 
evaluated by comparing the checklist to a conventional graphic rating 
scale format. The results suggest that the untranslated behavioral 
checklists improved the discriminant validity and reliability of 
dimension scores over a traditional graphic rating scale, but did not 
have a corresponding effect on the convergent validity of dimension 
scores. In addition, the untranslated behavioral checklist did not 
yield a significant relationship with performance. It is suggested 
that behavioral checklists have many benefits, and thus are very 
appropriate as a method of evaluating assessment center exercises. An 
appendix lists definitions of assessment center dimensions, (Contains 
7 tables and 21 references.) (Author/SLD) 



‘A- * A Vf A A A A A A A A A V; A A A A A A A Vc A A A Vc A A A A- A A' A' A- V; V; A A A A- 'A' A- Vr A V? *A A' A' A A A A A ‘A A A A A A A A A A A A A 

Reproductions supplied by EDRS are the best that can be made 



from the original document 





A A 



ERIC 




MPP-04-0OOQ 04:04 FROM Cl MIL SEPU hElJ jRLEhNS 



P.02 



TO 9 1 3002255288 1 2 1 37427 



C^l 

ON 


U.t. OCPAITTMCNT OF EDUCATION 

0«k:« oI Edoc»tion*i Research and imp«ovemeni 

EDUCATIONAL RESOURCES INFORMATION 
y CENTER (ERIC) 

OThta document has been reproduced as 
received Irom Ihtf person or organization 
onginating it 

□ Minor changes hsve been made to improve 
reproduction quality 


• PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 


<T) 

ON 

cn 


e Points of view or opinions staled m this docu 
ment do not necessarily represent official 
OERI position or policy 


rO THE tOUCATlONAL RESOURCES 
INFORMATION CENTER (ERlCl ' 


a 







An Analysis of the Effects of Untranslated Behavioral ChtckUsts 
on the Psychometric Properties of Assessment Centers 



Lisa M. Donahue, Melanie C. Jones 
New Orleans Civil Service Department 



Donald M. Tnixillo, Ph.D. 
Portland State University 



Nancy B. Goldstein 



Tulane University 



Paper presented at the annual International Personnel 

AsSssment Council Conference on Public Personnel Assessment, New Orleans, LA, June 1995. 











ERIC 




map -04-0000 04:04 FROM CIMIL SERU NEW ORLEANS 



TO 91800225528812137^27 



P.03 



Analysts of Untranslated Behavioral Checklists 



Abstract 

A field study was conducted to investigate the effects of "untranslated” behavioral checklists on 
certain psychomealc properties of an assessment center. The results suggea that the 
behavioral checklists improved the discriniuiant vaHdity and roliability of dimension 
scoros over a traditional graphic rating scale, but did not have a corrosponding affect on the 
convergent validity of dimension scores. In addition, the untranslaied behavioral checklist did 
not yield a significant relationship with performance. It is suggested that behavioral checkUsts 
have many benefits, and thus are very appropriate as a method of evaluating assessment center 
eaetcises. Possible areas of tuturc research ami addiaoual practical consideratioiis are discussed. 




MhP-04-0000 04: Q4 :^PDr! ■' J'lL SEP' ' NEM OPLEhHS 



TO 9130022052S812137427 P.04 



Analysis of Untranslated Behavioral Checklists 

3 



An Analysis of the Effects of Untranslated Behavioral Checklists 
on the Psychometric Properties of Assessment Centers 

Since the 1950s, assessment centers have been used extensively for purposes such as 
selection, promotion, training, and career development (Schmidt, Ones, & Hunter, 19i^2) for a 
wide variety of positions from professional-level personnel to production line workers (Reilly , 
Henry, & Smither, 1990). They have been used in a host of organizational settings including 
manufacturing, government, military, and educational settings (KlimosM & Brickner, 1987). 
A recent meta-analysis of assessment center validity conducted by Gaugler, Rosenthal, Thornton, 
and Bentson (1987) supports the wide-spread use of assessment centers for predicting 
performance in a variety of jobs and organizations. Specifically, Gaugler et al. (1987) reported 
an average corrected vaHdity coefficient of .37 for the fifty assessment centers examined. These 
meta-analytic results and others (Hunter &- Hunter, 1984; Schmitt, Gooding, Noe, Si Kiiscli, 
1984) have established the predictive validity of assessment centers. 



Construct Validity. ot 

However popular and effective a selection and diagnostic tool, assessment centers remain 
"the modem enigma in human resource practice" (KUmvTski & Erickner, 1987). Very little is 
known about why assessment centers yield predictive validity. Assessment centers are designed 
to produce standardized measures of separate constructs thought to repre,sent various job -related 



MhP-O 4-Q000 04 : Of 



FPOM CIUIL SEP" MEU OPLEAMS 



TO 918O0225E2S812137427 



P.05 



Analysis of tJiitnmslated Behavioral Checklists 

abUite associated wifl, successM p^otmance Wham, 1980). However dto preponderance 
of research suggests that assessment centers may not measure the constructs they puiport to 
measure (Sackett & Dreher, 1982; Thmage & Mucbinsky, 1982; Silverman, Datesio, Woods, 

S Johnson, 1986; Robertson. Qratton, & Sharpley, 1987; Bycio, Alvares, and Hahn, 1987). 
indeed, this problem with consmtc. validity led Sackett and Dreher (1982) to conclude ttta. mere 
was -viituaUy no support for the view that the assessment center technique generated dimension 
sco.« dmteanbo interpreted as representing complex constructs" (p. 409). Thus, thepiedictiye 
validity of assessment centers cannot be attributed to the successful measurement of job-related 
abilities, but instead is thought to be associated witi. factors toat are not well-understood 

(Klimoski & Brickner, 1987). 

Many investigators (e.g., Sackett & Dreher, 1982; Silverman ct ol., 1986; Tumage &. 
Muchinsl^y. 1982) have used the multitiait-tnultimethod matm approach (cf. Campbell & Fiske. 
1959) to examine the construct validity of assessment centers. To meet the requirements of 
construct validity according to this approach, ratings on the same dimension should be 
significantly correlated across exercises (convergent validity). In addition, these across-exeicise 
correlations must be greater than the withiii'cxeiuisc correlations among different dimensions 
(discriminant vaUdity). The multitrait-multiraethod research inv«5stigating the internal construct 
validity of assessment centers, however, has consistently shown higher within-exercise 
correlations of different dimensions tlian across-excrciss corrcMons of the same dimensions 
(Hohertson et al. , 1987; Sackett & Di-eher, 1982; Silverman et al. . 1986; iiimagc & Mudunsky , 




I 



MhP-04-0000 04 ! 05 



FROM CI"!L 5EPL' NELO ORLEANS 



TO 91800225528812137427 



P.Ob 



Analysis of Untranslated Behavioral Checklists 

Nddil and Neidig (1984) contend that the high »iihh«x«ciac correlations among 
ditterent assessment center dimensions reported in the above research do not demonstrate 
measurement error, but instead represent a true exercise effect. According to Neidrg and 
Neidig, the inclusion of multiple exerci^s iu auassessmeut ceu.er is intended U) assess behavior 
in a variety of job-related contexts, and "stable Performance across exemises by all participants 
i, not necessarily expeemd" (p. 184), Some assessees. for example, may be more effective in 
group exercises, rrhereas others may perform best on inavidual exercises. Tims, a lack of 
consistency in assessee behavior acms, exemises ma, reflem differences in individual 
effectiveness in various situations (Neidig & Neidig, 1984). Futthennote, Neidig Sc Neidig 
contend that the lack of behavioml eoosistency across exercises may also be related to d,e 

simatiohal specificity ottheexereises. Thus, dtesimationalconumtdeterminestben^ 

of a dimension in terms of specific behaviors, leademhip behaviors for one assessment «nter 

exercise, forinstance.nm,ber.a„ifested very differentiyfromtlmseofanoth^^ 

ofthe situational specificity of theexerdses. Thus. lack of convergence of dirueusioumtings 

,Cf«s exercises may not he a flaw iu assessment ceufets, but instead may rqrresmtt a,t expected 
lack of consistency in assessee benavior dne to dre simntional specificity of assessment center 

exercises (Neidig &- Neidig, 1984). 

unlike Neidig and Neidig. Sacked and Dreher (198T. 1984) are concerned with the 
Ability of assessment centem to measure intended constmets mtd seriously question the 
psychological meaning of assessment center dimension ratings that are virmally uncorrelamd. 
Thns. Sacked ami Dreher (1982, 1984) arew against assessment center designers claimmg to 
„,cas„re the intended constmets on cmtent-validity greunds when the available empmeal 



o 

ERIC 



rW-04-0000 04:06 FPOh CPJIL SEP’ NEUI ORLEhNS 



TO 9 1 30O22S5283 1 2 1 37427 



P.07 



Analysis of Untranslated Behavioral Checklists 

6 



^idence does eot support the consistency of dtatensiuuul petfonnanee across exercises” (p. 
187). Given the lack of constn.c, v.,Hdity evidence, Sackett and Dreher conclude 
that content-oriented exercise design is not sufficient to demonstrate the job-relatedness of 
assessment center exercises. Alternatively, they contend that additional validation evidence, 



either construct or criterion-related, is crucial. 

In response to Sackett and Oreher’s (1<'82) conclusions, Neidig and Neidig (1984) argued 

mat the inability of assessment center exercises to meet the teqnirements of constmct validity 
does not call into question the job-mlatedness of assessment center methods. Given the 
overtvhelming evidence of an exercise effect and the fact that assessment center exercises are 
essentially samples of job-related behavior, Neidig and Neidig (.984) and others (Byham, 1980; 
Haymaker & Grant, 1982; Jaffee & Sefcik, 1980; Schmitt & Noe, 1988) believe that the lob- 
retetedness of assessment centers should be established on the gionnds of content vaUdity by 
tmating individual exercises as work sample tests and using Subject Matter Experts to document 
the relationship between the content of the job, the assessment center dimensions, and the natnie 
of the exercises (Byham, 1980; Haymaker & Grant, 1982). Sackett and Dreher (1984) concede 
tot When assessment centers are used us a sample of pre-sen. behavior, mdier than a sign of 
future performance, a content validaUon strategy may be most appropriate tor demonstrating the 
job-relatedness of assessment center exercises. 




Some tcsccvh* have not ahanrtoued attempts to establish the construct validity of 
.assessment centers in favor of rdying on contcnt-oricutcd construction to demonstrate the job- 



o 

ERIC 



MAP-04-0000 04:06 FPOM CI'aL SEP'* fEU OPLEPMS 



TO 9 1 S002255238 121 37427 



P.08 



Analysis nf Untranslated Behavioral Checklists 

rebtedne^. InsCead, theae researchers comend that the failure of assessment centers to produce 
convergent validity of dimension ratings is due to the cognidve cempleaity of Ute rating ask 
.Oangler.Thomton, ,989;«eillyetal„ .990; Silverman et at. , 1986). Avoiding to Gaugler 
and Thornton 0989), the job of assessors is exceedingly contplex and ma, overwhelm their 
wmd information processing capabaitie,. In a typing assessment center, for instance, 

assessors mnst observe and teconi the petfotmance of eendldates on simadonal exereises, 

j tVi^n TJite each candidsite on each dimcnsioti 
classify die obsen'cd behaviors into dimensions, and the 

(Gaugler & Thornton, 1989). 

Recently, nmch attention has been given to the influence of assessment center 
methodology, (Gangier ^ Thornton, 1989; SUvemtan et al. , 1986; KeiM, et al. , 1990) on the 
cognidve complexity of the taring task and me conshuet vaHdity of dimension tarings. For 
ercampte, me research condnmed by ReiUy et al, (1990) is of particular imemst «. me pmsent 
study -oecause Reilly e, al. investigated the effects of a behavioral ehecldist rating scales on the 
convergent and discriminant validity of assessment center dimension mfmgs. To develop the 
checklists, Reilly et al. instiucted assessors to completed a firs, set of asses, mems ^or two group 
exercises and identified specific behavioral responses to each of the exercises that, 

J„.l» U.1, *. S.1> - “« "« *” 

pvercise Tl^e fmal behaviors comprising 
categorized the behaviors into dimensions wtthm each exerc.se. the 

u Pt-itcrion of 80 % agreement 

me checklist for each of the group exercise were those that me, 





L 



among the assessoxis. 



MPP-04-0000 04:07 FROM CIdIL SEP'J NEl.) ORLEANS 



TO 91800225528812137427 P.09 



Analysis of Untranslated Behavioral CheckUsts 

8 



Reilly et al.'s. (1990) findings suggest that the introduction of die behavioral checklist 
significantly improved convergent validity over piecheckiist ratings (from .24 to .43). In 
addition, the convergent vaUdity of their dimension ratings was slightly higher than the 
discriminant validity of assessment ratings (.43 versus .41). On the basis of these results, Seilly 



et al. concluded that the use of behavioral checklists alleviated the cognitive demands placed on 



lateis by focusing their attention on specific sets of behaviors relevant to the dimensions 
assessed. By categorizing behaviors by dimension, the retranslation process further reduced 
cognitive processing by eliminating the need for assessors to classify behaviors into their relevant 

dimensions. 



While Keilly et al. ' s (1990) resalts are encouraging from the stanJpoint of improving the 
pattern of convergent and discriminant validity among assessment center dimensions, the 
retranslation procedure employed by Reilly et al. eliminated 111 critical behavioral responses 
idemifted by the assessors. The omission of these behaviors from the behavioral checklist scales 
calls into question not only the content validity of the rating procedure, but also its fairness ta 
evaluating candidates, since no credit would be given for responses not meeting the 8056 



retnmsMion criterion. SUverotan et (1987) remind ns of the importance of evaluating 
assessment center methodology on overall dimension scores because selection and promotion 
decisions ultimately rests on these overall ratings. Thus, a method that would simulatenously 
improve the construct validity of dimension ratings while giving credit for all content-vahd 



responses 



elicited by candidates svould satisfy not only the loquiiements of consmict vaUdity. 



but also those of contcnt-o.ricntcd tost construction. 



MhP-04-0000 04 : 07 



FROM CI'JIL SEP'J NEW OPLEnhJS 



P. 10 



Analysis of Untranslated Behavioral Checklists 

The preseat study involves the evaluation of a medrad that attempts to obtain similar 
gains in convergent validity and discriminant vaUdity as those reported by Reilly et al. (1990) 
without sacrifidng content vaUaty and fairness in evaluating candidates. Specifically, the 
present study explored the effects of an "untranslated" behavioral checlcBst on the constmct 
vaUdity of dimension ratings. The untranslated behavioral checkUst used in the present 
investigation attempted to included all behavioral responses elicited by the assessment center 
exercises, not just those meeting a tetranslation criterion. In addition, the present study 
attempted to extend the work of Reilly et al. (1990) in two critical areas not explored. First, 
Reilly et al. invesfigated the convergence of dimension ratings for two group exercises, both of 
which involved an assembly problem. Thus, the convergence of dimension latlnp could be 
expected given the similarities in the situational contexts of the exeidses. The present study 
examined whether similar convergence of dimensions across exercises could be obtained across 
four job simulation exereises that varied greatly in content. Second, Reilly et al, (1990) wore 
unable to present findings regarding certain psychometric properties of the behavioral checklist. 
The present study evaluated both the behavioral checklist's rebability and criterion-related 
validity by comparing the chooMlst to a convendonat graphic rating scale format on these 
psychometric characteristics. Criterion vaUdity is of special interest given the argument that the 
predictive vaUdity of assessment centers may be related to subUe entenon contammanon 
(Klimosld & Brickner, 1987). Reffly et al. (1990) suggest that criteriou vaUdution studies 
including both a behavioral checkUst and conventional scale ratings may help to dettnmne if the 
wdl-csmbUshod retatiomhip between overall dimension ratings and performance is actnaUy due 
to criterion oomamination in which the ratings caphire snbtie fa."tors that are unrelated to 



MAR-Q4-G00O 04 : 07 



FROM CI'-'IL SEP'.J NEW OPLEhMS 



TO 91800225523812137427 P. 11 



Analysis of Untranslated Behavioral Checklists 

10 

effective task performance (e.g., presentation skills), but may attract high performance ratings 
in an organizational setting. 

Given the evidence of the reliability (Neidig & Neidig, 1984) and criterion validity 
(Gaugler et al. . 1987; Hunter & Hunter, 1984; Schmitt et al. , 1984) of assessment c^r ratings 
in the existing literature, s imilar results were expected regarding the effects of the "untranslated" 
behavioral checklist on these psychometric properties. Tn addition, it was hypothesized that the 
use of the "untranslated" behavioral checklist would produce similar patterns of convergent and 
discriminant validity reported by Reilly et al. Such findings were anticipated given that the 
"untranslated" behavioral checklist scales were expected to offer the same "cognitive-reduction'' 
advantages as those constructea by Reilly et al. through the use of the retranslation procedure. 
Specific-ally, these advantages include (1) focusing assessors’ attention directly on specific 
behavioral responses elicited by the exercises and (2) organ iz ing these behaviors according to 
the operational definitions of the dimensions, thereby eliminating the need for such categorization 
by the assessors. Such results would not only support Reilly et al.’s findings and bolster the use 
of behavioral checklists for improving the cuuslrucl validity of assessment center exercises, but 
would also eliminate the need for the retranslation process in developing behavioral checklist 
scales. Eliminating this process would simultaneously address the concerns associated with the 
content validity and fairness of the evaluation process. 



ERIC 



M(hP-O4-OO0O 04 : 0 s FROM C !' 'IL SEP'.' MEW OPLEhNS TO S1SQ022E52S81213T427 P.12 



Analysis of Untranslated Behavioral Checklists 



METHOD 



Participants 

Assessees (N = 178) were candidates for a Police promotional examination. The candidates 
included 164 males, 14 females, 132 wMtes, and 46 minorities. Assessors (N-4i) were 
Captains and Majors represeming various police deportments across the country. The assessors 
included ^4 males, 7 females, 29 whites, and 12 minorities. Each candidate was assessed by 
a team of two as'sessors assigned to one of four situational exercises comprising the assessment 
center, 'rhe number of teams for each exercise ranged from three to eight teamr, depending on 
the complexity of the exercise and the rating task. Each team of raters evaluated an average of 
forty-three (43) candidates using both a behavioral checklist and a graphic rating scale. 



Procedure 

Exercises and dimensions. The pairs of assessors observed and rated the candidates on 
one of four job simulation exercises. The exercises consisted of three situational videos an 
in-basket, and were developed on the basts of job analysis information and direct input from 
local subject matter experts (SMEs). Each of the situational videos depicted job-related 
scenarios that unfolded across multiple scenes. The scenarios portrayed in the situational videos 
included; apprehending a fleeing suspect involved in an armed robbery, counseling a subordirmte 
with a suspected drug addiction, and directing crowd control activities at cm abortion cUnic 



protest. 



riHP-Ca-OOOO 04;0S PC-CM SEP' ' NE'4 GPLEhNS 



P. IZ 




PG 91S00I2ZZ2SS121Z"-42T 



Analysis of Untranslated Behavioral Checklists 

12 

in responding to the situational videos, the candidates were required to assume the role 
of the target position, analyze the situation presented in each scene, and state the actions they 
would take in response to the situation. Time limits for each scene varied according to their 
complexity from two to four minutes. Candidates’ responses to each scene of the situational 

videos were videotaped to be subsequently rated by assessors. 

The in-basket exercise included a sample of the memos, foims, reports, and other 
paperwork typicaUy found in the target position’s in-basket. Additional job-related situations 
were aiso presented in the in-basket. Examples of these situations include: evidence of declining 
petfomwice of an officer, information suggesting the need for platoon training in report wrinng, 
and indications of possible sick leave abuse by some platoon members. 

Tlic candidates were given two and a half hours to analyze all the in-basket items and 
to pr^are their responses to the items. Tlie candidates were then given forty-five minutes to 
present tlieir responses, which were also videotaped to be rated later by assessors. 

Tlie above job simulation exercises were designed to measure nine dimensions identified 
through job analysis procedures as representmg job-related abilities lequiied fn effective 
performance in the target position. These dimensions were (1) Interpersonal', (2) Development 
of Subordinates', (3) Leadership and Delegation', (4) Problem Analysis d Decision Making', (5) 
OrganizaUan & Coordination', (6) Investigation d Police Work, {!) Oral Communicatim', (8) 
Control and Follov^Up', and (9) Use of Police References and Quantitative Resources. (See 

Appendix for the definitions of tho ditnonslons.) 

All of those nine dimensions were measured to the in-basket exercise and one of the 
situational video exercises. The wo lemaiutog video exercises measured only the first eight 

BEST copy AVAILABL1-, 

K’ 




MAR-04-0000 04:09 FROM CIUIL SERU NEW ORLEANS 



TO 91800225528812137427 P.14 



Analysis of Untranslated Behavioral Checklists 

dimensions and did not assess Ux af Police Referemes and Quantitadve Resources. A tenth 
dimension identified thtoneh the job analysis. Written Commumottion, was measured by a 
written exeiuise contained within the in-basket. This dimensioii was not included in the present 
Study, however, because it was not assessed in more than one exercise. 

BeMvioral CheckUst construcHon. BehavioraUy-specific responses used to develop the 
behavioral checklist scales for each of the four exercises were generated from two sources. 

prior to the administration of the assessment center local SMEs who assisted m the 
development of the exercises were polled for examples of poor, average, and excellent responses 
to the exercises. The SMEs were then asked to (1) categorize the responses into their relevant 
dimensions and (2) assign the responses a weight on a scale from -I to 3, where "-1" is u 
response that would have an adverse or ne.gasive affect on the sitmtion, ”0" is a response that 
would have no affect on the simUon, "1" is « response that is the least preferable or acceptable 
in the situation, "T is an average or standard response for the situatim, and 3 is an excellent 
response in the situation. The SMBs’ assignment of dimension and ratings to each response was 
then used to assist the test development staff in developing the behavioral checklist scales for 

e^ich of the a?sc3stncnt contor. 

The candidates represented the second source of behavioral responses used m the 
constniciion of the behavioral checklists. Followins tlic admiuistralion of the assessment center, 
test development staff members listened to a random selection of candidates’ aped performances 
in order to collect additional responses to each of the e.veicises. Appreximarely five peraent 
IN -40) of the candidates’ taped perfoimances were reviewed untU novel responses were no 

longer identified. 



MhP-04-0000 04:09 FROM SEP‘.' MEUI GPLEhMS 



TO 9 1 30G2255238 i 2 i 37427 



P. iS 



of Untranslated Behavioral Checklists 

14 



Because test development procedures precluded using local SMBs after Use admiuismtinn 
of the assessment renter, test development staff members were used to categorize candidate 
generated responses into dimensions. FoUowing exercise-specific training (described below), 
the assessors were presented witli both the SMB- and candidate-genera^ responses as weE as 
the value of each response assigned by the SMBs. They were then instructed to independenUy 
weight each response using the -1 to 3 scale described above. A round-robin techmque was then 
used in which individual assessors stated their ratings. Assessors then engaged in discussion 
regarding discrepancies in their ratings until a consensus was reached. The final weight for each 
cheddist response was the consensus weight assigned by the assessors. The combined use of 
these two sources of responses ensured that a near exhaustive Ust of content valid responses was 



included on the beliavioral checklists. 

.liscsror iraimng. AU teams of assessors participated in a one day training session 
designed to standardize evaluation approaches and increase the accuracy of ratings. The ttaining 
session was divided into two segments which included general and exercise-specific training. 
In general training, assessors received information regarding the target position and structuie of 
the organizaUon. In addition, assessors were instrocted on methods of observation and 
notetaMng, the use of rating forms, and die consensus process. Exercise-specific training 
consisted of reviewing the operadonal definitions of the dimensions to be assessed, in addition 
to reviewing each scene of the job simulaUon exercise and its corresponding behavioral checklist 
scales. At the end of exercise- specific training, assessors independontly rated a hypothetical 
candidato. Discussion then followed in which the assessors received feedback regarding the 



accumey of their ra, things. 




1 ! 



r'lHP-04-0000 04: 09 FROM Cr'IL SEP'.' NEW OPLEhNS 



P . lb 



Analysis of Untranslated Behavioral Checklists 

15 

Rating Procedures. Pairs of assessors for each job simulation exercise observed and 
recorded the. candidates’ responses to each scene. Because the candidates’ performances were 
videotaped, the assessors could review their responses as many times as necessary. Immediately 
after reviewing each candidate's presentation of responses to a scene, the assessors independently 
completed the behavioral clieckHst scales for the scene by marking all responses elicited by the 
candidate. FoUowiug the completion of the behavioral checkUst, assessors completed the graphic 
rating scale by making an overall rating for each dimension assessed by the exercise. 

Final dimension scores for the behavioral checklist wera calculated by summing 
individual response scores (weighted -1 to 3) for each dimension. These behavioral checklist 
suras (BCS) for the dimensions were then combined across the two assessors. Dimension scores 
for the graphic rating scales were represented by the overall rating given each dimension. These 
ratings were also combined across the two assessors. 



o 



r'HP-C4-000G 04:10 



P.17 



PPQM Cl' 'IL SEP" MEM QPLEhMS TQ S190022772SS1213742' 



Analysis of Untranslated Behavioral Checklists 

16 



Results 



Ove.nieM) 

Several analyses were performed to determine if the use of the untranslated behavioral 
checklist was associated with imprervements in the psychometric properties of assessment center 
ratings. First, both the multitrait-multimethod matrix (ct. Campbell & Fiske, 1939) aud factor 
analysis approaches were used to compare the construct validity of the behavioral checkHst sums 
CBCS) to the graphic' rating scale dimension scores. Second, both internal consistency 
(coefficient alpha) and inteirater reliability indices were used to examine the reliability of the 
BCS in relation to the graphic rating scale dimension scores. Finally, a concunent vaHdation 
appioach was used to identify any differences in the criterion vaUdity of the total scores 
produced by the behavioral checklist and the. graphic rating scale. 



AiKxlysis I: Multitrait-MuUifn^thod Mettrix 

The convergent and discmumant vaHdity of the BC.S and graphic rating scale dtaension 
scores was calculated according to the multitrdit-rouUlmethod matrix approach (cf. CampbeU & 
Fiske, 1S59). To examine the .scales’ convergent validity, the correlations among the same 
dimensions measured across each of the exercises (monotrait-heteromethod correlations) were 
calculated. These mean monotrait-heteruraethod correlations tor each dimension of the 
behavioral chedslist and graphic rating scale are listed in Table 1. As Table 1 tUustrates, the 



luseil Table 1 about heie 




r-1ftP-O4-00QO 04:10 FROM Cl' 'IL 'HEP" ilEI.I OPLEhNS 



TO 918O0225E23812137427 P.18 



Analysis of Untranslated Behavioral CheckUsts 

mean monoomt-hetetomethod coxreladon of U.c ECS and the gmpMc. mting scale 
dhnension scores teas .254 Crange=.a79 to .336; SD=.074) and .301 (range^.236 to .376; 
SDt=. 042), respectively. 

The discriminam validity of dte dimension scores produced by the behavioral checkUst 
ami graphic rating scale was assessed nsing hvo memods reconnnended by Urc mnltitrai.- 
nmitimethod matrbt approach (cf . Campbell & Fiske, 1959) First, corteladons among different 
dimensions measured in the different exemises (bemrotrait-heteromethod coneladons) were 
calculated and compared to the monotmit-heteromethod cormlations. As presented in Table 1. 
dte grand mean hemrottai.-hemtnmediod correlation was .236 (SD=.090) for the BCS and was 
,297 (SD=.074) for the graphic rating scale. Table 1 also iUnslrates that those grand mean 
heterottmt-heterontothod correlations were sUghtly lower than their monotrait-hetemmethod 
conntetpaits, especially for the BCS. This suggests some evidence of discriminant validity for 

both the BCS and graphic rating scale dimension scores. 

The second, more stringent method used to assess the discriminant validity of the BCS 

and the graphic rating scale dimension scores involved calcalatins the conelatiom among the 
different'dimcnsions within each of the exemises ftetemriait-monomethod cormlations). H-e 
mean heterotrait-monomethort emotions for each of fte exercises appear in Table 1 tor hotit 

thc behavioral checklist aiid the graphic rating scale. AsiUustrat , gran 

loT feranop- to 482- SD=.103) tor the BCS and .613 
monomethod coxxeLation was .393 (range-. 27 j to . 

(range = .S59 to .719; SD = .073) for the graphic rating scale dimension scores. 

The results of the multitrait-multimcthod approach imlicatc that the use r' the behavioral 
chccRUs. rosnhed in improvements in U.e discriminan. validity of Iho dimension semes. 



IhP-CU-OCOC 04 : 1 j 






:r>iL £EP’ NEU! oplehms 



TC ?lS0O22EE29Si2i: 



Analysis of Untranslated Behavioml Checklists 

18 

Applying the more rigorous criterion fur discriminant vaHdity, the results clearly show a smaller 
average within-exerdse (heterotrait-monomethod) correlation for the BCS (.393), as compared 
to the graphic rating scale dimension scores (.613). The multitrait-multimethud results did not 
suggest, however, that the behavioral checkBst was associated with gains in convergent validity. 
Specifically, the average across-^xercise (monotrait-heteroraethod) correlations for both the BCS 
and the graphic rating scale dimension scores are considerably lower than their witbin-exercise 
(heterotrait-monomethod) counterparts, suggesting poor convergent validity. Moreover, the 
acress-exercise (monotrait-heteromethod) condations associated with the BCS were much lower 

than those of the graphic rating scale dimension scores. 

While the level of discriminant validity produced l5y the behavioral checMist is 

appruAimatcly the same as that reported by Reilly et al. (1990) (r=, 38). the level of convergent 
validity does not approximate that found by Reilly et al. (r=.44), as predicted. Instead, the 
level of convergent vaHdity in the present study is similar to the level reported in other 
assessment center research (e.g., Bycio, 1987; Robertson et al., 1987; Russell, 1987). 



Analysis II: Factor Afuztysis 

A second approach to investigatmg the consUnct validi'y of the BCS and graphic rating 
scale dimension scenes foUowed from Sachet! and Drehet (19C2) and SEvemaa e! al. (1986). 
Specifically, Uiese researchers examined the underlying dimensionaUty of assessment center data 
by performing a prindpal-axls fectoi analysis using a VAMMAX rotation on the intercorrelation 

tiiDtxix oi dunf^Ti^ioti scores ncross exercises. 




r-iHP-04-0000 04:11 



P .20 



FPOM SEPU NE!.J GPlEhNS TO 9190022552S812137427 

Analysis of Untranslated Behavioral Checklists 

19 

111 performing the principal axis procedure, both nine- and four-factor solutions were 
hypothesized as potentially meaningful based on the number of dimensions and exercises. The 
four-factor solution was more inteipretable and is presented in Tables 2 and 3 for the behavioral 
checkHst and graphic rating scales, respectively. In examining the factor structures for the two 
scales, it is evident that very distinct exercise factors are present with only a few dkaensions 
loading on more than one factor. These factor analytic results are consistent with those found 
in other research (Sackett & Dreher, 1982; Silverman et al., 1986). In addition, they are 
consistent with the findings of the multitrait-multimethod procedure in that both suggest Uttie 
evidence of the consistency of dimensional performance across exercises. 



Insert Tables 2 and 3 about here 



Amlysis III: Internal Consistency and Imerrater Reliability 

The reliability of the BCS and graphic ratings scales dimension scores was assessed by 
the methods of iuicrnal consistency and interrater reliabiHty. The internal consistency of the 
dimension scores for the two scales was computed using Cronbach's coefficient alpha. These 
correlation coefficients are presented in Table 4 for both the behavioral checklist and graphic 
rating scale. As Table 4 illustrates, the grand mean coefficient alpha is .903 (range ==.834 to 



Table 4 filxiut here 




'' f 1 

1 / 



r'!ftR-a4-OO00 04:11 FROM CI'.'IL SERU NEW ORLEANS 



TO 9 1 3002255238 1 2 1 37427 R . 2 1 



Analysis of Untranslated Behavioral Checklists 

20 

.945; SL)™.036) for the BCS mui is .823 (range =”.783 to .859; SD = .021) forth?, graphic rating 
s.csi]e dimension scores. 

Interrater reliability for the behavioral checklist and graphic rating scales was calculated 
by correlating the dimension scores for the two assessors. The average intenater reliabiHty for 
each dimension on the behavioral checldtst and graphic rating scale is presented in Table 5. As 
Ulusuaied, tire grand mean interrater reliability for the BCS is .976 (range=1.00 to .953, 
SD== .014), and the grand mean intenater reliability for the gr^hic rating scale dimension scores 

was .904 (range=-944 to .854; SD=.028). 

The results of the reliability analyses suggest that while both the behavioral checklist and 
graphic rating scale yielded moderately high coefficient alphas and interratcr reliability 
coefficients for the dimension scores, the behavioral checklist produced higher mtemal 
consistency and interrater agreement than did the graphic rating scale. It must be noted, 
however, that the higher internal consistency of the behavioial checklist, as compared to the 
graphic rating scale, may be related to the number of responses included on the checklist, i hus, 
mtemal consistency may not be appropriate for assessing the rcliabiUty of behavioral checklists. 

In general, the results of tlis rdhbiUty analyses for both the behavioral checklist and 
graphic rating scale are consistent with earlier research reporting modemtely high intenater 
reliabiUty among assessment center dimension ratings (Neidig & Neidig, 1984). 



Insert Table 5 about here 



MPiR-04-0000 04:17 FROM CIUIL SERV NHUI ORLEANS 



TO S1800225528812137427 P.02 



Analysis of Untranslated Behavioral Checklists 

21 

Analysis IV: Criterion Validity 

Total scores for the assessment center produced by the behavioral checklist and graphic 
rating scale were computed by standardizing each dimension score and then weighting it 
according to its relative weight in the overall test plan. These standardize!! and weighted 
dimension scores were then summed to produce a total score for both the behavioral checklist 
and the graphic rating scale. The con'elation between the behavioral checklist and graphic rating 
scale total scores was .898. 

The total scores for the two scales were then correlated with candidates’ service ratings 
for the previous two years. In completing the performance ratings, the candidates’ supervisors 
rated die candidates uu riXicen possible dimensions using a five-point scale in which "5" was 
Outstanding and "1" was Unsatisfactory, These dimension ratings were then averaged to 
compute an overall performance mting score. 

As shown in Table 6, only the graphic rating scale total score had a significant 
relationship with candidates' 1992 (r = .163) and 1993 {r — .174) performance ratings. The 
relatively modest validity coefficients reported hi Table 6 for both the beliaviural cheddist and 



Insert Table 6 about here 



graphic rating scnle may be attributed to tlie lack of variance in the performance ratings. The 
mean overall rating for years 1992 and 1993 were 4.701 (SD~.357) and 4.72 (SD~.382), 
respectively. This explanation seems most plausible, given that an analysis of the reliability of 



MhP-04-000G Q4 



FPOM CI'.'IL SEP'.' NEW GPLEhNS 



TG ■?i30022f-5298i2137427 



P.03 



Analysis of Untranslated Behavioral Checklists 

22 

the 1992 and 1993 perfonnaiice ratings yielded a coefflcieDt alpha of .739 and .818. 
respectively. 

As previously stated, KlimosM and Bricloier (1987) have suggested tlmt criterion 
contamination may be responsible for the established relationship between assessment center 
scores and performance ratings. This claim would be investigated by a regression analysis in 
which graphic rating scale total scores would be regressed on the candidates’ overall 
performance rating scores, holding the behavioral checklist total score constant. Such an 
analysis could not* be performed in tne present study due to the relatively weak relationship 
between the performance ratings and the behaviotal checklist and graphic rating scales, and the 
strong interconelation of the total scores for the two scales. Under these conditions little effect 
for either scale, holding the other constant, would be expected 







% 



MAR-04-0000 04: 17 FROM CIUIL SERU NEW ORLEANS 



TO 91S00225528812137427 P.04 



Analysis of TTntranslated Behavioral Checklists 

23 



Discussion 

The main goal of this smdy was to investigate the effects of an "untranslated" behavioral 
checklist on certain psychometric properties of an assessment center, specifically the construct 
validity of assessment center dimension ratings. It was piedicted that the behavioral checklist 
alone, without the use of the retranslaUon procedure employed by Reilly et al. (1990), would 
reduce the cognMve complexity of 4e rating task and produce the same pattern of convergent 
and discriminant validity reported by Reiily et al. Such findings would eliminate the need for 
die retranslation process in the cohstmetion of behavioral checkUsts and would address issues 
related to the content validity and fairness of the evaluation process. 

The results suggest that the untranslated behavioral checklisls improved the discriminant 
vaadity and ndiabUlty of assessment center dimension scores over traditional graphic rating 
scales, bnt did not have a corresponding effect on the convergent validity of dimension scores, 
as was expected. In addition, the untranslated behavioral checklist, as compared to the graphic 

rating scale, did not yield a significant relationsliip with performance. 

While the untranslated behavioral checklist failed to produce a similar level of convorgent 



validity as lliai rcputXed by Reilly et al. 



the level of dificriminant validity obtained was 



upprexunatoly that of ReiUy ot al. (.39 vs. .41). Moreover, the level of discriminant vahdrty 
in the present study is better than that reported in earlier assessment center research. ReiUy et 
al. provided a summary of the caverge.tt and discriminant vaUdity findings of this research. 



o 

ERIC 



BEST COPY .'WAILAI3U' 



HAP-04-0GQ0 04: IS 



FPQti CI"IL SEP'i UE'l CPLEhN? 



TQ S13C022fS2S81213742‘ 



1” p 



Analysis of Untranslated Behavioral Checklists 

This summary is presented in Table 7. An inspection of Table 7 confums that only Sackett and 
Drehcr (1982) and Reilly « al. (1990) have obtained levels of discriminant vaUdity below .45. 



Insert Tabic 7 about here 



The level of discriminant validity achieved with the untranslaled behavioral checklist may 
be explained by the cognitive- reduction benefits associated with these scales. As explained 
earlier, even without employing the retianslation appreach, the behavioral checklists identify 
specific behavioral responses and organize them into their relevant dimensions Irased on the 
dimension's operational definition. Research has shown that assessors employ their own 
reduedonist strategies to contend with the cognitive complexity of the evaluation task hj using 
„„ly a few dimensions (Oangler & Thornton, 1989). Elimmating the need to categorize 
responses may have reduced the cognitive demands imposed on the assessors, and in turn 
decreased the amount of convergence typicaUy found among dimension ratings within 

exercises. ' 

Two explanations are offered for the failure of the uottanskted behavioral checklist to 
similarly affect Re convergent vaUdity of dimension ratings. First, the omission of the 
rctranslation procedure from the behavioral checklist construcUon process may have resulted in 
poorer converge,), vabety. Specfflcaily. Reilly ct al. (1990) posmlamd to Re retranslation 
procedure may benefit asscssora by providing them with a clearer underatanding of the 
dimension dclmWons, and thus enabling them to .acre effectively identify and categorize 
behaviors. Reilly ct al.. however, rejected this as a possible explanation for their findings. 



o 

-Rir 



<■) ) ■■ 



t.iHP-04-OOOO 04:13 



FPOr t C I ' ' ! L 



P.06 



SEP" UEh: OPLEhNS TO 3130012':': 2SS1213T42'' 




Analysis of Untranslated Behavioral Checklists 

25 

stating tiitii piioi leseaich lias not demonstrated that the extent of assessor training moderates 
assessment center validity (Gaugler et al., 198T). This explanation is also rejected as a means 
of describing the results of the present study. Thus, eliminating the retranslation process, as 
proposed, would not be expected to have a negative impact on the convergent validity of 
dimension ratings. 

A more plausible explanation for the poor convergent validit}^ in the present study relates 
directly to the methodology used in Reilly et al.’s investigation. As mentioned earlier, these 
authors investigated the convergent validity of two group exercises both of which involved an 
assembly problem. Because responses to exercises are situationally determined, consistency in 
dimensional performance could be expected for these two exercises because of their like 
contexts, and thus would explain the convergence of dimension ratings across the two exercises. 
This explanation is supported by two lines of evidence. First, the factor analytic results of the 
present study, which evaluated four contextually different exercises using a behavioral checklist, 
clearly showed the presence of distinct exercise factors, not separate dimensions. Similar factor 
analytic results have been obtained in other assessment center research that includes multiple 
exercises (Sackett Sc Dreher, 1982; Silverman et al. , 1987). Seoonn, the findings of the present 
study revealed very different levels of convergent validity than those of ReiUy et al. (.25 vs. 
.43), even when a behavioral checklist was also employed. Moreover, no other research 
investigating the convergent validity of multiple exercise reports convergent validity of this 
magnitude (see i'able 7). Tlie exercises used in these studies included group exercises, in- 
baskets, role plays, and m<oiviews; m no instance wert) only two exercise similar in format and 



content i,m;d. 



MhR-04-0000 04:19 FROM C I U I L SEPU NEW ORLEANS 



TO 91800225528812137427 P.07 



Analysis of Untranslated Behavioral Checklists 

26 

Future reseaix:h should determine if the gains in convergent validity achieved by Reilly 
et ol. were due primarily to the selection of exercises similar in situational contexts. If ReiUy 
et al’s results cannot be repHcated using multiple exercises, then the available evidence supports 
what others have identified as an "exercise effect" (Neidig & Neidig, 1984) in which variance 
in dimensional performance across exercises is expected due the different situauons in which the 
candidate is placed. The abundance of evidence demonstrating such an effect has led to a 
recommendation that attempts to measure constructs be abandoned in favor of measuring specific 
behavioral re^nses to work samples designed to simulate important work activities identified 

through job analysis (Neidig & Neidig, 1984; Sackett & Dreher, 1984). 

In this vein, behavioral checklists would be very appropriate as a method of exercise 
evaluation. As the results of the present study indicate, behavioral checklists improved the 
internal consistency and interrater reliability of dimension scores, and thus could be expected 
to yield similar results for scores on overall exercises. As mentioned earHer, internal 
consistency may not be as appropriate as interrater reliability for the evaluation of the behavioral 
checklist’s reliability because of its tendency to increase proportionate to the number of items 

assessed. 

The high ijitenater reliability (r=.976) obtained in the present study may be attributed 
to the near objecUve level of ihe rating task when employing a behavioral eheiddist (Heilly et 
al. , 1990). Not only would the objective nature of the behavioral ehecldist increase the amount 
of agreement among assessors, but it also has the added benefit ot reducing the reliance on 
consensus discussion which can be time-intensive. Instead, assessors can independently rate 



p i i 




BEST COPY AVAILABLE 




MAR-04-0000 04:19 



FROM CP.'IL SEP" MEM OPLE4NS 



P.OS 



TO 918O0225S288121Z742V 

Analysis of Untranslated Behavioral Checklists 

27 

candidatss using the behavioral checklist and limit their discussion to areas in which there is 
disagreement. 

In addition to increasing the reliability of assessment center ratings, the behavioral 
checklists also has other advantages. Foremost among these benefits is its ability to ensure 
content validity of the evaluation process, Sackett (1987) points out that in developing 
assessment centers, evaluations of content validity are typically made on the basis of the stimulus 
materials alone with little attention being given to the scoring process. Developed with the 
assistance of subject matter experts, behavioral checklists ensure the identification of content 
valid behavioral responses to the exercises. In identifying and retaining all responses elicited 
by a particular exercise, even those with a low or zero weight, ensures the most complete 
(content valid) scales. As previcmsly mentioned, ensuring the content validity of the rating 
process is very important from both the perspectives of fece vaLdity and fairness in evaluating 
candidates. 

Reilly et al. (1990) have suggested that behavioral checklists offer still other benefits that 
are related to the evaluation process itself. Specifically, not only is the rating process less 
cognitively demanding, but it is also simplified by eliminating the need for raters to categorize 
behaviors and discuss their ratings. Likewise, assessor training can also be simplified when 
employing behavioral checklists by focusing on the recognition and recall of specific behaviors. 
Finally, the feedback process can be enhanced by the use of behavioral checklists in that 
assessees can be provided with more specific feedback regarding their perforaiauee on an 
exercise. It must be mentioned, however, that in reducing the evaluation process to an objective 



riPiP-O4-O00O 04: 19 FROM CI"IL SEP'.' MEU ORLEPMS 



TO 913Q022552S81213742' 



P.09 



Analysis of Untranslated Behavioral Checklists 

28 

le'^ei, consideration must be given to Uic attitudes of the assessors regarding the rating process, 

especially if they have besn selected for their expertise. 

In sum, the results of this study suggest that "untranslated" behavioral checklists failed 
to produce the levels of convergent vaUdity rqjorted by ReiUy et al. (1990). However, this may 
be due to a prevalent exercise effect found when evaluating multiple exercises that are 
contextually different. However, the untranslated behavioral checklist was associated with gains 
in discriminant validity and reliability. In addition, its use may increase the content vaHdity of 
the assessment process itself. Because of the unreliability and skewness in the distribution of 
the performance ratings in the present study, no conclusions could be made regarding the 
criterion validity of the behavioral checklist. Future research should examine the differential 
validity of various evaluation methods to determine their affect on criterion validity. In addition, 
such research would allow for an investigation of the claim that criterion contamination is 
responsible for the weU-established relationship between assessment center ratings and 

performance. 



Practical Implications 

Whae this study suggests that there are many advantages associated with the use of 
untranslated behaviotal checkHsts in the evaluation of assessment centers, such as improvements 
in certain psychometric charaeteristies and a clear establishment of the content-validity of the 
evaluation process, we caution potential users regarding certain practical limitations associated 
with behavior checklists, based on our experieticc over the last eight years. 




MPP-04-0000 04:20 FROM CIUIL SEP'.' NEW OPLEPHS 



TO 91800225528812137427 



P. 10 



Analysis of Untranslated Behavioral Checklists 

29 

An pteviously uicntioncd, consideration must be given to assessors’ attitudes since the 
evaluation process is reduced to a near objective le\'el. Assessors are often selected for their 
qualifications, and behavioral checklist scales do not fully utilize this expertise. To determine 
what impact the behavioral checklist scales had on assessors’ attitudes, a 10-item scale assessing 
preferences for using the graphic rating scale or the behavioral checklists was developed and 
administered to tlie assessors participating in the present study. An analysis of the survey s 
results showed no significant differences in rater preferences for the two scale types (t(27) = .65). 
However, a prevalent theme in the assessors’ comments was that they often felt hindered by not 
having any discretion when making ratings on the behavioral checklists. 

In addition to considering the effects behavioral checklists may have on assessors’ 
attitudes, the potential user should also be concerned with the time and cost involved in 
developing these scales. Employing the behavioral checklist will protract the scale development 
process. For example, to ensure the completeness (content validity) of responses on the 
behavioral checklist, it is often necessary to increase the number of SMBs involved in scale 
construction. It may also be necessary, as in our case, to sample candidates’ perfuiuiauu^s in 
an effort to identify a near exhaustive list nf content valid responses. 

Behavioral checklists can also make the rating process itself more time- and labor- 
intensive. Depending on the complexity of the exercise, the number of responses on the 
behavioral checkUst can be great. As the volume of responses increases, so too does the 
difficulty of the rating task. As a result, additional assessors arc often needed to compensate 

for the time required to rate a single candidate. 

3C 

o 

ERIC 



MhP-04-QOOO 134:20 FROM 'IP'IL 5EPU NEU OPLEhMS 



P.ll 



I’d 91 3002255283 12 137427 

•Analysis of Untranslated Behasioral Checklists 

30 

In sum, careful consideiuiion should be given to the choice of evtiluaticr< methods for 
assessment centers. There are many benefits to using behavioral checklists, as suggested. 
However, behavioral checklists may not be appropriate in all situations, given the practical 
concerns we have described. Specifically, behavioral checklists may not be appropriate for 
smaller organizations that do not have the staff and otlier resources retjuired to support their 
development, admiuLstration, and scoring. We suggest that behavioral checklists may be most 
appropriate for evaluating exercises that are designed to measure very specific behaviors, thus 
limiting the number of responses. In addition, we recommend that behavioral checkUsts are used 
to evaluate assessment centers for positions with smaller numbers of candidates. Otherwise, 
there will be dimishing returns on the effort extended to obtain better quality data, due to the 
time aisd cost involved in using 1 shavioral checklist. 




31 



mR-04-0000 04:20 FROM CIUIL SERU NEW ORLEANS 



TO 91800225528812137427 P.12 



Analysis of Untranslated Behavioral Checklists 

31 



References 



Eycio. P., Alvares, K.M., & Hanh, J. (1987). Situational specificity in assessment center 
ratings: A confirmatory factor analysis. Journal of Acnlkd Ps ychploey. 74, 478-494. 

Byham, W.C. (1980, February). Starting an assessment center the right way. Eeisomiel 
Administrator . 27-32. 

Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the 
multitrait-multimethod matrix. Psychological Bulletin . 36, 81-105. 



Dreher, G.F., & Sackett, P.R. (1981). Some problems with applying content validity evidence 
to assessment center procedures. Academy Mf^nangement Review. 6, 551-560. 

Gaugler, B.B., & Thornton, G.C. (1989). Number of assessment center dimensions as a 
determinant of assessor accuracy. Journal of Applied-P sychology, 74, 611-618. 

Gaugler, B.B., Rosenthal, D.B., Thornton, G.C. DI, & Bentson, C. (1987). Meta-analysis 
of assessment center validity [Monograph]. Journal of Armlisd PsYC h o l osy , 72, 493-511. 

Haymaker, J.C., & Giant, D.L. (1982). Development of a model for content vaHdation of 
assessment centers. Jofuroal of Assessment Center TechnologY ., 1(1). 15-17. 



Hunter J.E., & Hunter, R.F. (1984). Validity and utility of alternative predictors of job 
performance. Psychological Bulletin . 96, 72-68. 



Jaifee, C.L., & Sefick, J.T. (1980, February). What is an assessment center? gs sorm c l 
Administrator . 40-43. 



Klimoski, R., & Brickner, M. (1987). Why do assessment centers work? The puzzle of 
assessment center validity. Personnel Psychology, 40, 243-260. 



Neidig R.D., & Neidig P.J. (1984). Multiple assessment center exercises and job relatedness. 
Tnumal of A pplif-d Psvcholgv. 69, 182-186. 

Reilly. R.R., Henry. S., & Smither, J.W. (1990). An examination of the effects of using 
behavior checklists on the wusiiuct validity of assessment center dimensions. Perso^i^ 
Psychology . 43, 71-84. 



q t.D 



BEST COPY AVAILABLE 




KhP-04-0000 04: Z1 



FPOM ':i"!L SEF'.' NEUI GPLEhMS 



TO 



F. 1: 






Analysis of Untranslated Behavioral Checklists 

32 

Robertson, L, Qratton, L., & Sharpley, (1987). Psychometric properties and the desi^ of 
managerial assessment centers: Dimensions into exercises won't go. Journal of , C)cc . URaltQ S al 
Psychology . 60, 187-195. 

Sackett P.R., & Dreher G.F. (1982). Constructs and assessment center dimensions: Some 
doubling cmpiriwU findings. Journal of Applied P sychology, 67, 401-410. 

Sackett P.R., &Dieher G.F. (1984). Situationalspecificity of behavior and assessment center 
validation strategies: A rejoinder to Neidig and Neidig. Jpumal of Apgh^PsychQiSgy: ^ » 401- 
410. 

Schmidt, F.L., Ones, D.S., & Hunter, J.E. (1992). Personnel selection. . Annt t ^ Review , . of 
Psychology . 43, 627-o70. 

Schmitt, N.. Gooding. R.Z., Noe, R.A., & Kirsch, M. (1984). Meta-analysis of validity 
studies published between 1964 and 1982 and the investigation of study characteristics. P er s o n n el 

Psychology . 37, 407-422. 

Schmitt, N., & Noe, R.A. (1983). Demonstration of content validity: Assessment center 
example. .Toumal of Assessm ent Center Technology, 6(2), 5-11. 

Silvennan, W.H., Dalessio, A., Woods, S.B., & Johnson, R.L. Jr. (1986). Influence of 
assessment center metliods on assessors’ ratings. P^rsonneiPsyghplogy, 39, 565-578. 

Smith, P.C. & Kendall, L.M. (1963) Retranslation of expectations: An approach to the 
construction of unambiguous anchors for rating scales. .Touma l of Applied Psyc hology. 47. 149- 
155. 

Tumage H, Muchinsky PM. (1982). Transituaiional variability in human peiformauce with 
assessment centers, nr gani7Ational Behavior and Human p erformance, 30, 174-200. 





MfiP-04-0000 04=21 



FROM CIi.'IL SEP'.' ^^EH OPLEhMS 



TO 91S0Q22552S81213742? 



P. 14 



Analysis of Untranslated Behavioral Checklists 

33 



TABLE 1 

Dimension and Exercise Correlcuions for the Behavioral Checklist and 

the Graphic Rating Scale 
















Behavioral Checklist 


Graphic Rating Scale 




M«m r 


SD 


Mean r 


SD 


Dimension (Monotrait- 
Heteromethod Correlations) 


InterpersoUil 


.262 


.064 


,270 


.079 


Development of 5,ubordmates 


.295 


.071 


.315 


.086 


Leadership & Delegation 


.219 


.107 


,314 


.058 


Problem Analysis &. Decision Making 


.2855 


.071 


.292 


.073 


Organiaatioa, Coordination, & Resource Allocation 


.336 


.147 


.317 


.042 


Investigation & Police Work 


.282 


.118 


.300 


.090 


Oral Communication 


.238 


.065 


.236 


.044 


Control & Follow-Up 


.284 


.057 


,376 


.106 


Use of Police References & Quantitative Resources 


.079 


.098 


.359 


.030 


Grand Mean and SD 


.254 


.074 


:5W 


.042 


Heterotrait-Heteromethod 

Correlations 


Grand Mean and $D 


.236 


.090 


.297 


.074 


Emtc^ (Heterotrait- 
Monomethod Correlcuions) 


Apprc^cndizii^ Suspects Situational Video 


.478 


.237 




.iin 


Problem Subordinate Situational Video 


.273 


.111 


.562 


. 106 


Protest Situational Video 


.341 


.107 


.615 


.063 


Tn-Bftjiket 


.482 


.140 


.719 


.082 


Grand Mean and SD 


.393 


.103 


.613 


.075 



O ■'f 

,v 



o 

ERIC 



MhP-G4-00OO 04 



FPGM 



SEP" ^€kl GPLEhMS 



TO 9 i 80022ET2S8 i 2 1 3742' 



P, 15 



CI'.'IL 



Analysis of Untranslated Behavioral Cheddists 

34 



TABLE 2 

Rotated Factor Pattern for the Behavioral Oiecklist Sms (BCS) 



Ditnensiont 



Interpersontl 

Development of Subordinatce 

T.«4der«htp db Deleyttion 

Problem Anidyfif & Decision Making 

OrganiZAtioa, Coctdmatioii, & Resource Allocfition 

Investigation <fe PoHcc Woric 

Oi&l t;oininumc«tton 

Control & Follow-Up 

Use of Police References & Qu^mtl^itive Resourcss 
Irtterpofsonal 

Development of Subordinates 
Leadership ^ Delegation 
Problem Analysis ^ D«ciKiuu M«kuig 
InvcstigAtioa & Police Work 
Oral Communiettion 
Control 4k Follow-Up 

Use of Police Refereccefl & Quantitative Resourcep. 



Intetpersonal 

Di9v«lijyuMui wf Subordinat«4 

Leadership & Delegation 

Problem Analysis & Decision Making 

Organitition. Coordination. & Resource Allocation 

Investigation * Police Work 

Oral Communicadon 

Control & Follow-Up 



Exerciees 


I 


n 


III 


IV 


Apprehending Suspects 


.69 








i^rehending Suspacta 


.42 








Anwheadiiw Suspects 


.79 








Apprehendu^ Suspects 


.89 








Apprehending Suspects 


.86 








Apprehending Suspects 


52 








Ayprtfllw*dUii!i Sunpcvts 


.61 








Apprehecdtng Suspects 


.73 








Apprehending Suspects 






.4^1 




Problem Subordinate 


.39 




.36 




Prt^lem Subordinate 






.48 




Problem Subordinate 






.51 




rroblcm Subordinate 






.55 




Problem Subordinate 






.47 




Problem Subordinirte 










Problem Subordinate 






.52 




Problem Subordinate 






.45 




Protest 








.59 


Protout 






51 




Protest 








.51 


Ptoteat 








.74 


Protest 




.37 


.43 


.40 


Pfoteat 










Protest 








.60 


Protect 








.61 



Interpcifonal 

Development of Subordinates 

Leadership Delegation 

Problem AntlyaU Sc Decision Making 

Organization, Coordination, Sc Resource Allocation 

Investigation & Police Work 

Oral Communication 

Control & Follow-Up 

Use of Police Reference* <fc Quantitative Resoilfcwt 



In-Basket 

In-Baakct 

In-Bsfiket 

In-Basket -79 

In-Buktt -73 

In-Basket -66 

In-Baak«t 



In-Basket 

In-Basket 



l^ote. Only factor loadings greater than or equal to are presented. 



'■V 

o I 





MhP-04-0000 04:22 



FROM 



■ri'.'IL SEP'-i NEli 0PLEPN5 



TO 9180022EE2S81213742T P. IE 



Analysis of Untranslated Behavioral Checklists 

35 



TABLE 3 

Rotated Factor Pattern for the Graphic Rating Scale Dimension Scores 



Pimenstonr. 



lutttpercojul 

Development of Subordinates 

Letdership PeiegtUon 

Problem Analysis & Decision Mtilng 

Oi^anizadon^ Coordination. ^ Reeource Allocation 

Investigation ^ Polict Work 

Oral Communication 

Control St- Follow-Up 

Use of Police Refereccca & Quantitative Resources 
Interpersonal 

Development of Subordinates 
Leader^ip & Delegation 
ProbUra Analysis & Decision Making 
laveatigation & Police Work 
Oral Communication 
Control & Follow-Up 

Vk> of Police Sc QiwmtitAOvc Rcooufocn 



Interpersonal 

Development of Subordinates 

Leadership Delegation 

Problem Analysis & Decision Making 

Organiratiotu Coordination, ^ Resource Allocation 

Invc#t3g«tiO£l Sc Polioo Worfc 

Oral Communication 

Control Sc Follow-Up 

Intorpetaonal 

Development of Subordinates 

Leadership St Delegation 

ProbWm Anmtyiit Sc Deeiaioa Making 

Organizadoa. Coordination. Sc Resouvee Allocadon 

Investigation <fe Police Work 

Oral Communication 

Control & Follow-Up 

Use of Police References QuandiatWe Resources 



Exercises I 



Apprehending Suspects 
Apprehending Suspects 
Apprehending Suspects 
Appicheodiug 
Apprehending Suspects 
Apprehending Suspects 
Apprehending Suspects 
Apprehending Suspects 
Apprehending Suspects 

Pioblem gubordvpato 
Problem Subordinate 
Problem Subordinate 
Problem Subordinate 
Problem Subordinate 
Problem Subordinate 
Problem Subordinate 
Problem Subordinate 

Protest 

Protest 

Protest 

Protest 

Protest 

PjOUMt 

Protest 

Prourt 

In-Bankai 

In-Basket 

In-Basket 

Tn-Bask«t 

In-Baakct 

In-Basket 

In-Basket 

In-Baskei 

In-Bajikct 



II m r/ 

~M ~40 

.79 

.81 

.80 

.76 

.7^ 

.35 .72 



.56 

.75 

.81 

.80 

.63 

.72 

.74 

75 

.71 

.75 

.82 

.80 

.76 

R7 

75 

.80 

by 

86 

89 

87 

36 

82 

77 

85 

87 



Note, Oaly factor loAdingu greater thsm or equal to .35 are 




BEST COPY AVAILABLE 



MPP-Q4-O00Q 



PROM 



CIi.'IL 5EP" UEi4 ORLEH^^S 



TO 91 8002255288 1213742T 



P. I" 



04:22 



Analysis of Untranslated Belmvioial Checklists 

36 



TABLE 4 

Coefficietu Alpha by Dimension and Total Score for the 
Behavioral Checklist and Graphic Rating Scale 



Dimension 

Interpersonal 

Development of Subordinates 

Leader^p & DelegatioD 

Problem Analysis & Decision Making 

Organization^ Coordination, & Kesourcc Allocation 

Investigation & Police Work 

Oral Conanunication 

Control & Follow-Up 

Use of Police References & Quantitative Resources 

Grand Mean 



Behavioral Checklist 
Coefficient Alpha 


Graphic Rating Scale 
Coeffident Alpha 


.m 


.818 


.938 


.839 


.900 


.839 


.927 


.820 


.909 


.800 


.945 


.828 


.858 


.783 


.933 


.859 


.834 


.823 


.903 


.823 



FPGM CI"IL SEP" NEU ORLEANS 



P. iS 



nAP-04-OCOO 04:23 



TO S1S0Q225S2S81213T42T 



Analysis of Untranslated Behavioral Checklists 

37 



TABLES 

Intenater Reliability by Dimension for the 
Behavioral Checklist and GrapHc Rating Scale 



Behavioral Checklist Gfaphic Rating Scale 

Mean r SD Mean r SD 



Dimension 

Interpersonal 

Development of Subordinates 

Leadership & Delegation 

Problem Analysis & Decision Making 

Organization, Coordination, & Resource Allocation 

Investigation & Police Work 

Oral Communication 

Control & Follow-Dp 

Use of Police References & Quantitative Resources 

Grand Mean 



1.000 


.000 


.977 


.026 


.978 


.045 


.962 


.012 


.988 


.007 


.964 


.030 


.953 


.025 


.973 


.001 


.992 


.021 


.976 


.014 



.943 


.048 


.934 


.074 


.930 


.089 


.902 


.064 


.893 


.094 


.897 


.067 


.873 


.048 


.913 


.083 


.853 


.108 


.904 


.028 






. .nr. 






Analysis of Untranslated Behavioral Checklists 



TABLE 6 



Coirelations Betv,'een Petforrmnce Ratings and Behavioral Checklist 
and Graphic Rating Scale Total Scores 



B*h»vioffh$hiefflHidng Sr-nlR 



1992 Perfonn^ce Ratings 

1993 Performance Ratings 



.1425 -1625^ 

,1376 



]^?^Sigmficant at p < .05 level. N = 173 for 1992 Performance Ratings anh N = 149 for the 1993 Pert-orm.ance 
Ratings, 



* 




BEST COPY AVAILABLE 







Analysis of Untranslated Behavioral Checklists 

39 



TABLE? 

Convergent and DL^criminani Validity Results of 
Assessment Censer Research Summarized by' ReUly et al. (1990) 



Source 



Average Average 

Convergent Validity Discriminant Validity 



Reilly et al. (1990) 


.43 


.41 


Sackett & Dieber (1982) 


.07 


.64 


Company A 


.11 


.40 


Company B 


.51 


.65 


Company C 

Tumage & Muchinsky (1982) 




.53 


Sample A 


.44 


.52 


Sample B 

Silverman et al. (198P) 


.54 


.65 


Sample A 


.37 


.68 


San^ie B 


.25 


.53 


Russell (198/,) 


.36 


.75 


Bycio et al. (1987) 
Robertson et al. (1987) 
Organization 1 


.23 

.26 


.64 

.66 


Organization 2 


.23 


.60 


Organization 3 


.11 


.49 


Organization 4 



O 

ERIC 



FROM CI'JIL SER'J MEW OPLEAf^S 



TO 3180Q22552S812137427 



P.21 



MAP-04-0QQO 



04:2: 



Analysis of Untranslated Behavioral Checklists 

40 

Apxiendrr 

Definitions of Assessment Center Diroensions 

1. Interpersonal - the ability to use human relations skills in interacting with subordinates, 
superiors, citizens, and other personnel within the department and outside agencies 

2. Develupmeiu of SubordUmes - Uie ability to develop subordinates by establishing 
guidelines, observing behavior, and providing feedback, counseling, or disc^linaiy 
actions 

3. Leadership and Delegation - the ability to direct activities of subordinates in order to 
achieve departmental goals 

4. Problem /inalysis <& Decision Making - the ability to identify potential and existing 
problems and to make high quality, timely decisions 

5. Organization & Coordination - the ability to organize and coordinate resources on scene 
and administratively 

6. Investigation <& Police Work - the ability to ask questions that obtain information to 
further an investigation and to perceive critical information and determine when to use 
different techniques 

7. Oral Communication - the ability to communicate ideas, orders, and assignments orally 
to a wide variety of people 

\ 

Control and toUow-Up - the ability to follow up on goals, assignments, unsolved and 
ongoing problems and prcrjects 

9 . Use of Police References and Quantitative Resources - the ability to use police resources 
as guides in decision making and application 

o 

ERIC 



41 



